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Use  High  Performance  Computing  (HPC)  to  simulate  the 
dynamics  of  real-life  engineering  mechanical  systems  at 
unprecedented  levels  of  accuracy 


HPC  hardware  targeted: 

-  Cluster  of  CPUs  and  GPUs  (accelerators) 

•  More  than  100  CPU  cores,  tens  of  GPU  cards,  tens  of  thousands  of  GPU  cores 
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•  Overview  of  the  engineering  problems  of  interest 

•  Large-scale  Multibody  Dynamics 

-  Problem  formulation,  solution  method,  and  parallel 
implementation 

•  Overview  of  Heterogeneous  Computing  Template  (HCT) 

•  Numerical  Experiments 

•  Conclusions 
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Fluid-Solid  Interaction:  Navier-Stokes  +  Newton-Euler. 
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•  Wheeled/tracked  vehicle  mobility  on  granular  terrain 

•  Also  interested  in  scooping  and  loading  granular  material 
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Frictional  Contact  Simulation 

[Commercial  Solution] 


•  Model  Parameters: 

-  Spheres:  60  mm  diameter  and  mass  0.882  kg 

-  Forces:  smoothing  with  stiffness  of  1 E5,  force 
exponent  of  2.2,  damping  coefficient  of  1 0.0, 
and  a  penetration  depth  of  0.1 

-  Simulation  length:  3  seconds 
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Frictional  Contact: 

Two  Different  Approaches 
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•  Discrete  Element  Method  (DEM)  -  draws  on  a  “smoothing” 
(penalty)  approach 

•  Lots  of  heuristics 

•  Slow 

•  General  purpose 

•  Used  in  ADAMS 

•  DVI-based  (Differential  Variational  Inequalities) 

•  A  set  of  differential  equations  combined  with  inequality  constraints 

•  Fast  (stable  for  significantly  larger  integration  step-sizes) 

•  Less  general  purpose 

•  Used  widely  in  computer  games 
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Generalized  Positions 


Velocity  Transformation  Matrix 


Generalized  Mass  Matrix 


Generalized  Velocities 


Frictional 


q  =T(q)v  Reaction 
Force 


Contact  Force 

A 


M(q)v  =  f(?,q,v)-g;(q,OA.  +  X(rX'i+yX’i  +  /XJ) 


-v',. 


/= 1 


Applied  Force 

§(<!>  0  -  0  Contact  Impulse,  for  Contact “i” 

i  =  l,2,...,^ 

/  Tt\I  n  Total  Number 


0<9'(q,/)  1  y'n  >  0 


(r'u>r‘w)=j  argmin^  ‘ofContacts 

WfalbtY +(4)  ~T: T.  .  .  „ 

Friction  Dissipation  Energy 

Gap  Function,  for  Contact  “i” 


Friction  Impulse  Components,  for  Contact  “z” 
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Traditional  Discretization  Scheme 


1JPT1 1 

MSTv 


MODELING  rhg  SIMULATION,  TESTING  and  VALIDATION 


w 


positions 


time  step  index 


r 


q 


(i+i)  - 


=  4©  +  /iL(q^^)v^+1^ 


Mass  Mat. 


speeds 


Reaction 


M 

vr 
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Relaxed  Discretization  Scheme 
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q(^+i)  _  q(0  _(_  /iL(q^) )v^+1^ 


v(i+i)  _v«)  =  /if(f(i),q(i);V(0)  +  ^i6^(9(J)  (5)(7iinDi)n+7i,tlDi,u+7i;mDi>t„) 


€  -A(g(i),6)  :  0  <  +  T>?nv(l+1'>  -(^VC^  Dj,M)2  +  vT  Dj,w)2]-L  7n  >  0 


(' Y*,u>'7«,w ) 


argmin  vT  (7^  DiiU  +  7iiU,  DijU1 ) . 


Relaxation  Term 


(Anitescu  &  Tasora,  2008) 
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F  The  Cone  Complementarity  Problem 
[CCP] 


MODELING™ SIMULATION,  TESTING  find  VRLIDflTIDN 


w 


•  Overall  approach  assume  the  form  of  a  Cone  Complementarity  Problem  (CCP) 


Introduce  the  convex  hypercone... 

?=(  ©  TC 

Vie-Afqhe) 


...  and  its  polar  hypercone: 

T°  =  (  ©  TCio 

\i€./l(q!,e) 


TC 1  €  M3  represents  friction  cone  associated  with  ith  contact 


CCP  assumes  following  form:  Find  y  such  that 

7  ©  T  _L  — (N7  +  d)  ©  T° 
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Numerical  solution  can  leverage  parallel  computing 


1  August  2011  UNCLASSIFIED:  Dist.  A.  Approved  for  public  release 


GVSETS 


CPU  vs.  GPU  -  Flop  Rate 

[GFIop/Sec] 


CPU  -  in  orange 

GPU -in  blue 


□  Single  Precision 
A  Double  Precision 


2003  2004  2005  2006  2007  2008  2009  2010 
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300K  Spheres  in  Tank 

[parallel  on  the  GPU] 
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Computational  dynamics 

Tracked  vehicle  mobility 


Simulation  Setup: 

•  Driving  speed:  1 .0  rad/sec 

•  Length:  12  seconds 

•  Time  step:  0.005  sec 

•  Computation  time:  18.5  hours 

•  Particle  radius:  .027273  m 

•  Terrain:  284,715  particles 
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Track  Simulation 


Parameters: 

•  Driving  speed:  1 .0  rad/sec 

•  Length:  10  seconds 

•  Time  step:  0.005  sec 

•  Computation  time:  17.8  hours 

•  Particle  radius:  .025±.0025  m 

•  Terrain:  467,100  particles 
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Track  Footprint 
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A  Heterogeneous  Computing  Template 

for 

Computational  Dynamics 
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Gigabit  Ethernet 
Switch 


Second  fastest  cluster  at  University  of  Wisconsin-Madison 
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[DEM  solution] 
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Computation  Using  Multiple  CPUs 

[DEM  solution] 
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Computation  Using  Multiple  CPUs 

f  [DEM  solution] 


If 
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Computation  Using  Multiple  CPUs 

[DEM  solution] 
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Simulation  of  MRAP  impacted  by  debris 


1  August  2011  UNCLASSIFIED:  Dist.  A.  Approved  for  public  release 


GVSETS 


Heterogeneous  Computing  Template 
Five  Major  Components 


MODELING  m  SIMULATION,  TESTING  and  VALIDATION 


9 


Computational  Dynamics  requires 


-  Domain  decomposition 
^  -  Proximity  computation 

-  Inter-domain  data  exchange 
►  -  Numerical  algorithm  support 

-  Post-processing  (visualization) 


HCT  represents  the  library  support  and  associated  API 
that  capture  this  five  component  abstraction 
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•  Frictionless  case  (bound  constraints  in  place) 

-  Gauss-Jacobi  (CE) 

-  Projected  conjugate  gradient  (ProjCG) 

-  Gradient  projected  conjugate  gradient  (GPCG) 

-  Gradient  projected  MINRES  (GPMINRES) 


•  Friction  case  (cone  constraints  -  ongoing) 

-  Newton’s  Method  for  large  bound-constrained  problems 

•  Uses  re-parameterization  to  handle  friction  cones  (replace  with  bound 
constraints) 
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•  Test  Problem:  40,000  bodies  — ►  157,520  contacts 

•  Frictionless 
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residual  norm 


Test  Problem  (MATLAB) 


Method 

Iterations 

Final  Residual 

Norm 

Ymin 

Ymax 

Time  [sec] 

CE 

1000 

6.11  x  10-2 

0.0 

2.0598 

1849.5 

ProjCG 

1002 

5.6344  x  10-4 

0.0 

2.2286 

1235.6 

GPCG 

1600 

1.0675  x  10-4 

0.0 

2.6349 

382.3644 

GPMinres 

1100 

9.5239  x  10-5 

0.0 

2.3090 

238.0744 

PCG 

1000 

2.4053  x  10-4 

-1.1116 

2.5254 

27.9686 

GMRES 

1000 

4.5315  x  10-5 

-1.1635 

2.5227 

736.3007 

MINRES 

1000 

1.6979  x  IQ  5 

-1.1316 

2.5253 

41.5790 
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•  30,000  feet  perspective: 

-  Carry  out  spatial  partitioning  of  the  volume  occupied  by  the  bodies 

•  Place  bodies  in  bins  (cubes,  for  instance) 

-  Follow  up  by  brute  force  search  for  all  bodies  touching  each  bin 

•  Embarrassingly  parallel 
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d  =  p,  -  p2  =  M,  +  ^- M2  }c  +  (b,  -  b2 } 


3d  3P,  3P2 


32d 


a2pt 


a2p. 


da.  da.  da.  da.da.  da.da.  da.da. 

i  i  i  i  j  i  j  i  j 
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Ellipsoid-Ellipsoid  CD:  Results 
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Speedup  GPU  vs.  CPU 
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ST# 

[results  reported  are  for  spheres] 

*  J  r 

GPU:  NVIDIA  Tesla  C1060 
CPU:  AMD  Phenom  II  Black  X4  940  (3.0  GHz) 


200 


0  1  2  3  4  5  6 

Contacts  (Millions) 
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Parallel  Implementation: 

Number  of  Contacts  vs.  Detection  Time 

[results  reported  are  for  spheres] 
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Assembled  Quad  GPU  Machine 

Processor:  AMD  Phenom  II  X4  940  Black 
Memory:  16GB  DDR2 
Graphics:  4x  NVIDIA  Tesla  Cl  060 
Power  supply  1 : 1 000W 
Power  supply  2:  750W 
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Open  MP 


Main  Data  Set 


Results 


A  ^  ^  "K 

\/  w  \/  \/ 


Quad  Core  AMD 
Microprocessor 


Tesla  C1060 
4x4  GB  Memory 
4x30720  threads 
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Results  -  Contacts  vs.  Time 
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Quad  Tesla  C1060  Configuration 

200 


0  40 


0  1  2  3  4  5  6 
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•  HPC  will  soon  enable  simulation  of  billion-body  problems 

-  Tremendous  advances  in  compute  power  over  the  last  five  years 

•  Our  work:  Heterogeneous  Computing  Template  (HCT) 

-  HCT  draws  on  symbiosis  of  CPU  +  GPU  computing 


•  Accomplishments  to  date 

-  Billion  body  parallel  collision  detection 

-  Large  scale  parallel  solution  of  cone  complementarity  problem 

-  Early  validation  results  encouraging 
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-  Validation  efforts:  at  CAT,  US  Army,  and  JPL 

-  Massively  parallel  linear  algebra  for  solution  of  CCP  problem 

•  Preconditioned  gradient  projected  Krylov  method 

-  Parallel  collision  detection  for  complex  geometries 

-  Multiphysics: 

•  Fluid-solid  interaction 

•  Cohesion 

•  Electrostatics 
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